A framework for creating prediction intervals or sets with statistical coverage guarantees.
Conditional Coverage Diagnostic: Empirically test if the prediction sets achieve the intended coverage. \[C_j = \frac{1}{n_{\text{val}}} \sum_{i=1}^{n_{\text{val}}} \mathbf{1}\left\{ Y_{i,j}^{(\text{val})} \in C_j\left(X_{i,j}^{(\text{val})}\right) \right\}, \quad \text{for } j = 1, \ldots, R,\]
Approximate test: \[\overline{C} = \frac{1}{R} \sum_{j=1}^R C_j \approx 1 - \alpha\]
This ensures that for almost every instance \(X_{test}\), the probability of containing the true label is at least \(1 - \alpha\).
Adjustments:
Calibration Set Size:
Key idea: the coverage of conformal prediction conditionally on the calibration set is a random quantity
Coverage Distribution: \[P(Y_{test} \in C(X_{test}) \mid \{X_i, Y_i)\}_{i=1}^n) \sim \text{Beta}(n+1-l, l), \quad l = \lfloor (n+1)\alpha \rfloor\]
import os
os.environ["KERAS_BACKEND"] = "torch"
import keras
from keras import layers
from keras.datasets import mnist
from sklearn.model_selection import train_test_split
import numpy as np
# 1. Load and preprocess the MNIST data
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train[:5_000]
x_train = x_train.reshape((x_train.shape[0], 28 * 28)).astype("float32") / 255
x_test = x_test.reshape((x_test.shape[0], 28 * 28)).astype("float32") / 255
x_test, x_cal, y_test, y_cal = train_test_split(
x_test, y_test, test_size=0.5, random_state=42
)
num_classes = 10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
# 2. Build a simple model
model = keras.Sequential(
[
layers.Dense(64, activation="relu", input_shape=(784,)),
layers.Dense(64, activation="relu"),
layers.Dense(num_classes, activation="softmax"),
]
)
# 3. Compile the model
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])
# 4. Train the model
model.fit(x_train, y_train, epochs=1, batch_size=128, validation_split=0.1)
# 5. Evaluate on the test set
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=0)
print("Test accuracy:", test_acc) # Test accuracy: 0.83# 1: get conformal scores
n = y_cal.shape[0]
cal_smx = model(x_cal).softmax(dim=1).detach().cpu().numpy()
cal_scores = 1 - cal_smx[np.arange(n), y_cal]
# 2: get adjusted quantile
alpha = 0.1
q_level = np.ceil((n + 1) * (1 - alpha)) / n
qhat = np.quantile(cal_scores, q_level, method="higher")
test_smx = model(x_test).softmax(dim=1).detach().cpu().numpy()
# 3: form prediction sets
prediction_sets = test_smx >= (1 - qhat)for i in range(5):
sets = np.where(prediction_sets[i])[0]
label = np.argmax(y_test[i])
print(f"Prediction set for image {i}: {sets}, True label: {label}")Prediction set for image 0: [8], True label: 8
Prediction set for image 1: [4 9], True label: 4
Prediction set for image 2: [3 5], True label: 3
Prediction set for image 3: [1], True label: 1
Prediction set for image 4: [2], True label: 2
More formally, given image-class pairs \(\{(X_i, Y_i)\}_{i=1}^n\) and an image classifier \(\hat{f}\), we seek:
\[ \mathbb{P}\left( Y_{\text{test}} = \hat{Y}(X_{\text{test}}) \;\middle|\; \hat{P}(X_{\text{test}}) \geq \hat{\lambda} \right) \geq 1 - \alpha, \]
where \(\hat{Y}(x) = \arg\max_y \hat{f}(x)_y\), \(\hat{P}(X_{\text{test}}) = \max_y \hat{f}(X_{\text{test}})_y\), and \(\hat{\lambda}\) is a threshold chosen using the calibration set.